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Abstract 

Bantu speech communities expanded over large parts of sub-Saharan Africa within the last 4000-5000 years, reaching 
different parts of southern Africa 1200-2000 years ago. The Bantu languages subdivide in several major branches, with 
languages belonging to the Eastern and Western Bantu branches spreading over large parts of Central, Eastern, and 
Southern Africa. There is still debate whether this linguistic divide is correlated with a genetic distinction between Eastern 
and Western Bantu speakers. During their expansion, Bantu speakers would have come into contact with diverse local 
populations, such as the Khoisan hunter-gatherers and pastoralists of southern Africa, with whom they may have 
intermarried. In this study, we analyze complete mtDNA genome sequences from over 900 Bantu-speaking individuals from 
Angola, Zambia, Namibia, and Botswana to investigate the demographic processes at play during the last stages of the 
Bantu expansion. Our results show that most of these Bantu-speaking populations are genetically very homogenous, with 
no genetic division between speakers of Eastern and Western Bantu languages. Most of the mtDNA diversity in our dataset 
is due to different degrees of admixture with autochthonous populations. Only the pastoralist Himba and Herero stand out 
due to high frequencies of particular L3f and L3d lineages; the latter are also found in the neighboring Damara, who speak a 
Khoisan language and were foragers and small-stock herders. In contrast, the close cultural and linguistic relatives of the 
Herero and Himba, the Kuvale, are genetically similar to other Bantu-speakers. Nevertheless, as demonstrated by resampling 
tests, the genetic divergence of Herero, Himba, and Kuvale is compatible with a common shared ancestry with high levels of 
drift, while the similarity of the Herero, Himba, and Damara probably reflects admixture, as also suggested by linguistic 
analyses. 
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Introduction 

Bantu languages started to diffuse from their homeland in the 
Grassfields of Cameroon around 4,000-5,000 years ago, reaching 
the southernmost areas of the continent in only a few thousand 
years [1—5]. This spread, strongly associated in its later phases with 
the diffusion of technological advances related to metallurgy and 
an agricultural lifestyle, was probably the result of a long-distance 
migration of people who partially replaced the local forager and 
pastoralist populations, or intermixed with them [2,6,7]. From a 
linguistic perspective, the genealogical unity of the Bantu family is 
certain, even though its boundary with other branches of the 
Niger-Congo phylum is not clear-cut and the internal classification 
and distinction between languages and dialects is highly debated 



[4,8]. The region close to the putative homeland represents the 
highest linguistic diversity. The first Bantu branches to split off, 
such as Mbam-Bubi and North-West Bantu, are confined to 
Cameroon and immediately neighboring regions [9]. The 
remainder of the Bantu languages predominantly belong to two 
major branches, namely Eastern Bantu and Western Bantu, which 
are further divided in several sub-groups. Although a recent 
investigation finds a distinct trace of the eastern route of the Bantu 
migration in Y-chromosomal variation [10], other molecular 
anthropological studies fail to find evidence for a genetic 
differentiation of the populations speaking Western and Eastern 
Bantu languages [11,12]. 
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Southern Africa represents the last phase of the Bantu 
expansion. Archaeological data reveal traces of an agricultural 
way of subsistence in Namibia, Zambia and Botswana around 
2000-1200 years ago [1,13,14], which was preceded by a few 
centuries by an immigration of pastoralist cultures [15,16]. Thus, 
in these areas, the presumably Bantu-speaking agriculturalist 
immigrants would have met both populations of hunter-gatherers 
as well as pastoralists, whose descendants comprise the linguisti- 
cally, culturally, and genetically diverse "Khoisan" populations 
[17,18]. 

The Bantu-speaking populations nowadays inhabiting southern 
Africa are quite diverse linguistically and culturally, comprising 
pastoralists, agro-pastoralists, and agriculturalists who speak 
languages belonging to several different subgroups of both Eastern 
and Western Bantu; these populations share the same territory and 
are often involved in trade. From a genetic perspective, these 
populations appear to be relatively homogenous, with little 
differences even among linguistically distinct populations 
[6,11,12,19]. The main genetic signal characterizing the people 
at the southernmost edges of the Bantu expansion is the degree of 
admixture with the autochthonous populations; this can be 
explicitiy measured by the presence of the characteristic mtDNA 
haplogroups LOd and LOk and Y-chromosomal haplogroups A- 
M51, A-M23, and B-M112 [20-22]. Admixture with autochtho- 
nous peoples in Bantu-speaking populations is detectable predom- 
inantly in the maternal line, in accordance with sex-biased gene 
flow [20,23] . The level of admixture differs considerably among 
populations; in particular, substantial proportions of mtDNA 
haplogroups LOd and/or LOk are observed in the pastoralist 
Kuvale from southwestern Angola [21], in the Fwe of southwest- 
ern Zambia [20], and in the Zulu and Xhosa from South Africa 
[24]. In contrast, in populations from eastern Zambia, Zimbabwe, 
and Mozambique these characteristic autochthonous haplogroups 
are found at a frequency of at most 3% [19,25,26]. 

Among the culturally distinct populations in southern Africa are 
the Herero, Himba, and Kuvale from northern Namibia and 
southern Angola, who speak dialects of the same Bantu language 
and practice intensive semi-nomadic cattle pastoralism. The 
Herero and Himba appear genetically distinct from other Bantu- 
speaking populations of the area, including the culturally similar 
Kuvale [17,21,27,28]. Genetically, the closest relatives of the 
Herero and Himba are the Damara [17,18], hunter-gatherers and 
small stock herders who speak a Khoisan language of the Khoe- 
Kwadi family [29]. Intriguingly, the Herero were known as 
"Cattle Damara" and the Damara were referred to as "Berg 
Damara" in previous literature [29]. 

In this study we analyze complete mtDNA genome sequences of 
944 Bantu-speaking individuals from Angola, Zambia, Namibia, 
and Botswana to investigate the maternal genetic history of Bantu 
speakers of southern Africa. We also include 38 Damara mtDNA 
genome sequences to further investigate the close genetic 
relationship between the Herero, Himba, and Damara that 
emerged in previous research [18]. We focus on the following 
research questions: 1) does the linguistic division into Western and 
Eastern Bantu correlate with genetic divergence? 2) To what 
extent did the immigrating Bantu-speaking agriculturalists inter- 
marry with autochthonous populations? 3) What factors can 
explain the genetic divergence between the culturally and 
linguistically closely related Himba, Herero, and Kuvale on the 
one hand, and the genetic proximity of the Himba and Herero to 
the culturally and linguistically very distinct Damara? Our results 
reveal a general homogeneity of the maternal lineages of Bantu 
speakers of Angola and Zambia and suggest different demographic 



histories for the Herero, Himba and Damara from Namibia as 
well as for Bantu-speaking populations of southern Botswana. 

Materials and Methods 

Ethics Statement 

This study was approved by the Ethics Committee of the 
University of Leipzig, the Research Ethics Committee of the 
University of Zambia and Namibe's Provincial Health Depart- 
ment. Published samples from Botswana and Namibia come from 
a sample collection authorized by the governments of Botswana 
and of Namibia (Research permit CYSC 1/17/2 IV (8) from the 
Ministry of Youth Sport and Culture of Botswana, and 17/3/3 
from the Ministry of Health and Social Services of Namibia). 
Samples from several populations of southwestern Angola were 
collected as described by [21]. Each individual gave written 
consent after the purpose of the study was explained with the help 
of local translators. Individuals were assigned to populations 
according to the ethnolinguistic affiliation (i.e., primary language 
spoken) of their maternal grandmother, as declared during sample 
collection. 

Samples and mtDNA Sequence Data 

Details on the samples and DNA extraction are given in 
[12,17,20,21]. Full mtDNA sequence data were generated from 
genome libraries tagged with either single or double indexes, and 
enriched for mtDNA following protocols described previously 
[30,31]. The libraries were sequenced on the Illumina GAIIx 
(Solexa) platform, using either single or paired end runs of 76 bp 
length, resulting in an average coverage of ~400x. Sequences 
were manually checked with Bioedit (www.mbio.ncsu.edu/ 
BioEdit/bioedit.html) and read alignments were screened with 
ma [32] to confirm indels. To minimize the impact of missing 
data, we applied imputation using stringent criteria, replacing 
missing sites with the nucleotide that was present in at least two 
otherwise identical haplotypes of the dataset. Before imputation, 
47 individuals presented missing data (8.4% of the total number of 
sequences, with a maximum of four sites with missing data per 
individual); after imputation, the number of individuals with 
missing data was reduced to 23 (3.7% of the total number of 
sequences, with a maximum of three sites with missing data per 
individual), with a total of 26 missing sites imputed. 

One hundred and ninety seven sequences from Botswana, 
Namibia, and Angola were previously included in studies focusing 
on haplogroups LOd and LOk as well as on the prehistory of 
Khoisan populations [18,33], while a subset of 169 sequences from 
Zambia were included in [20] ; the GenBank accession numbers of 
these samples can be found in Table SI. The remaining 446 
sequences from Zambia and 1 70 sequences from Angola have not 
yet been reported and are available from GenBank with accession 
numbers KJ185394-KJ186009. The final alignment consists of 
982 sequences of 16465 bp. Positions with missing data as well as 
the poly-C-regions (np 303-315, 16183-16194) were removed 
from all analyses; in addition, 45 positions with indels were 
removed from all analyses run in Arlequin and from sequences 
used for network construction. 

Our dataset includes speakers of several Bantu languages 
belonging to both the Western and the Eastern branches of Bantu 
according to the classification found on Glottolog 2.2 (http:// 
glottolog.org/). While we were able to group most of the samples 
into 17 ethnolinguistically homogenous populations that corre- 
spond to the identification of donors' maternal grandmothers, 
some ethnolinguistic groups were represented by only a few 
individuals. In these cases, we united samples in five aggregate 
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"populations" of speakers of closely related varieties based mainly 
on linguistic criteria, but ensuring that the resulting populations 
were genetically homogeneous as shown by non-significant 
between-population variance in AMOVA analyses and non- 
significant $st distances (see Table SI for the ethnolinguistic 
affiliation of each individual and the composition of the aggregate 
"populations"). In addition, 49 individuals sampled in Namibia 
and Zambia could not be grouped into populations with a 
meaningful sample size or had an unclear ethnic affiliation; these 
were labeled as "others" and included only in lineage-based 
analyses (i.e. networks and phylogenetic trees). Table SI provides 
details on the country of sampling, ethnolinguistic affiliation, and 
GenBank accession number for each sample. We also included the 
Damara, who speak a Khoe language rather than a Bantu 
language, because of their known genetic proximity to Herero and 
Himba [17,18]. The rough geographic location of the 23 
populations included in the study can be seen in Figure 1, while 
Table 1 summarizes the information on their country of origin, 
linguistic affiliation, and subsistence. 

To investigate the variation in haplogroups L3d and L3f we 
further included published data from African populations: for the 
visual depiction in Surfer maps, haplogroup frequency data was 
collected from the literature as summarized in Table S2, while for 
networks and BEAST runs, 28 published complete mtDNA 
genome sequences available from GenBank were added to the 
alignment [34-38]. 

Statistical Analyses 

Haplogroup assignment was performed with the online tool 
Haplogrep [39] . Haplogroup affiliation of individuals belonging to 
haplogroup LOd and LOk was further defined following the 
nomenclature reported in [33] (see Table SI). Analyses of 
Molecular Variance (AMOVA), values of sequence diversity, 
and matrices of pairwise 0 ST values were computed in Arlequin 
ver. 3.11, while values of nucleotide diversity and variance were 
calculated in R with the package Pegas [40]. A correspondence 
analysis (CA) based on haplogroup frequencies was performed 
with the package ca [41], and non-metric Multi-Dimensional 
Scaling (MDS) analyses based on pairwise <1>.st values were 




Figure 1. Map showing the rough geographical location of 
populations, colored by linguistic affiliation. Abbreviations of 
population labels are as specified in Table 1. 
doi:10.1371/journal.pone.0099117.g001 



performed with the function "isoMDS" from the package MASS 
[42] . The Africa-wide variation in frequency of haplogroups L3d 
and L3f was visualized on a map with the software Surfer ver. 
10.4.799 (Golden Software Inc.). Median-joining networks [43] 
with no pre- or post-processing steps were computed with Network 
4. 1 1 (www.fluxus-engineering.com) and visualized in Network 
publisher 1.3.0.0. For the networks of haplogroups L3d and L3f all 
sites were given equal weights, while for the network of haplogroup 
LOd the hypervariable positions were downweighted by one third 
in order to reduce the effect of reticulation. A Mantel test was 
performed between genetic (CDst) and geographic distances with the 
R package vegan [44]; geographic distances between populations 
were averaged over GPS data from the individual sampling 
locations with the function rdist. earth of the package fields [45]. 

BEAST (vl.7.2; [46]) was used to construct Bayesian Skyline 
Plots and phylogenetic trees, based on the complete mtDNA 
sequence and using the mutation rate of 1.665 x 10 -8 from Soares 
et al. [47]. A Generalized Time Reversible model was applied, 
and multiple runs were performed for each dataset, using 10, 20 or 
30 million chains for single haplogroups and populations. For the 
schematic tree of the whole dataset 40 million chains and a GTR 
mutation model were used. The most probable tree from the 
BEAST runs was assembled with TreeAnnotator and drawn with 
FigTree v 1.4.0. 

Resampling tests were performed in R to investigate the possible 
shared ancestry of the Herero, Himba, and Damara on the one 
hand, and the Herero, Himba, and Kuvale on the other, 
notwithstanding the absence of haplogroup L3f in the Damara 
and the absence of L3d in the Kuvale and the concomitant high 
frequencies of these haplogroups in the Herero and Himba. In 
both cases we proceeded as follows: the Herero and the Himba 
were considered a single population with a sample size of 51 
individuals, while for the Damara and Kuvale we used the actual 
sample sizes included in the study (i.e. 38 and 53 individuals, 
respectively). We then created a series of hypothetical ancestral 
populations with N e = 2000 and a number of marked individuals 
corresponding to a range of frequencies of the haplogroup of 
interest. This ancestral population was split into two daughter 
populations with N e = 1000 (one population corresponding to the 
Himba/Herero and the other corresponding to the Damara or the 
Kuvale). From these we consecutively sampled the same number 
of individuals with replacement (i.e. the population size was kept 
constant) for a number of generations proportional to 500, 1000, 
or 2000 years (with a generation time of 25 years). After the final 
resampling step, a number of individuals corresponding to the 
population samples of interest (i.e. 53 for Kuvale, 51 for Himba/ 
Herero, or 38 for Damara) were sampled 100 times from the two 
daughter populations, and the probability of having a frequency of 
the haplogroup of interest within the range of the respective 
confidence intervals for both populations simultaneously was 
recorded. The entire process was repeated 10,000 times for each 
initial haplogroup frequency tested and each of the three split 
times considered, and the average probabilities were recorded in a 
table. No migration was considered in any of the simulations. The 
procedure is further described in the Results section, and the R 
script used to perform the simulations is available upon request 
from the authors. 

Results 

Genetic Structure of Southern African Bantu-speakers 

As can be seen from Table S3 and Figure SI in File SI, 
haplogroups found in relatively high frequency across most of the 
populations of the dataset are LOa, Lie, L2a, and L3e. Other 
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haplogroups, however, are more restricted, being found in only a 
few populations; of these, L3d and L3f (discussed in detail below) 
show a particularly striking distribution, being found in very high 
frequency only in the populations of Namibia. 

There is very little discernible structure in the maternal 
genepool of the Bantu-speaking populations of southern Africa, 
as shown by a three-dimensional MDS analysis. Only two distinct 
groups of populations emerge (Figure 2A): the Himba, Herero, 
and the non-Bantu-speaking Damara from Namibia on the one 
hand, and the Kuvale from Angola as well as the Tswana and 
Kgalagadi from Botswana on the other hand; the Tswana and 
Kgalagadi are separated from their geographic neighbors the 
Wider Shona. The third dimension, however, splits the Damara 
from the Himba and Herero. It is notable that the Kuvale are 
closer to other Bantu-speaking groups than to the Himba and the 
Herero, who are genetically more similar to the Khoe-speaking 
Damara in spite of being culturally and linguistically related to the 
Kuvale. 

Haplogroups LOd and LOk are known to be an introgression 
from autochthonous populations [18,24,33,35] and are thus 
indicative of post-immigration contact rather than reflecting the 
genetic relationships among Bantu-speakers themselves; the 
differential effects of gene flow from autochthonous populations 
are addressed below. When excluding these clearly introgressed 
lineages from the MDS analysis, the same two distinctive groups of 
populations, namely the Kuvale, Tswana, and Kgalagadi, and the 
Himba, Herero, and Damara, emerge in an even more 
pronounced manner, with the third dimension again separating 
the Damara from the Himba and Herero; all the other Bantu- 
speaking populations, in contrast, cluster very closely (Figure 2B). 

As can be seen in the CA plots displayed in Figure S2 in File SI, 
the distinct position of the Himba, Herero and Damara 
populations is driven by their high levels of haplogroup L3d 
(Figure S2A in File SI), which is completely absent from the 



Kuvale (Table S3). In contrast, the Tswana and Kgalagadi, who 
stand out in the MDS analysis, are no longer separated in these 
CA plots, suggesting that their separate position in the MDS plots 
is mainly due to divergent sequence types rather than a distinct 
haplogroup composition. The CA plots additionally highlight 
other aspects of the data, separating the Northeast Zambia 
population (characterized by the presence of the otherwise largely 
absent haplogroups LOf and L4) or the Fwe and Shanjo, who have 
high frequencies of haplogroup LOk (Figure S2A in File SI). When 
excluding these outliers, populations with very high frequencies of 
haplogroup LOd (Kgalagadi, Tswana, Wider Shona, and Kuvale) 
stand out (Figure S2B in File SI). The influence of different levels 
of admixture with autochthonous populations on the genetic 
structure of the southern African Bantu-speaking groups is 
additionally illustrated by the lack of discernable clusters when 
the introgressed haplogroups LOd and LOk are excluded (Figure 
S2C in File SI). 

The difference of the Himba, Herero, and Damara from the 
other populations included in this study also becomes apparent 
from measures of genetic diversity (Table 1): while diversity is high 
for the Bantu-speaking populations in general, with many of the 
ethnolinguistically defined self-identified groups (e.g. Nyaneka, 
Ovimbundu, Kwamashi, Mbukushu) having values of sequence 
diversity of 0.99-1.00, and with nucleotide diversity ranging 
between 0.0033 and 0.0040, the Himba, Herero and Damara 
stand out in having both very low sequence diversity (0.93, 0.94, 
and 0.89, respectively) and nucleotide diversity (0.0022 for the 
Herero-speakers, 0.0025 for the Damara). The Kuvale again differ 
from the other Herero-speakers: although their sequence diversity 
is relatively low (0.95), their nucleotide diversity is twice as high as 
that of their linguistic and cultural relatives (0.0040); this diversity 
pattern resembles that of the Fwe from southwestern Zambia, who 
have a sequence diversity of only 0.93, but nucleotide diversity of 
0.0038. 



East Bantu 
West Bantu 

Khoisan 



B 




-0.1 0.0 0.1 0.2 

Dimension 1 



1.0 -0.8 -0.6 -0.4 -0.2 0.0 0.2 

Dimension 1 



Figure 2. Three-dimensional MDS analysis based on pairwise Q S t values between populations. Color coding by linguistic affiliation; 
abbreviations of population labels are as specified in Table 1. A. Including all sequences, stress = 5.35 B. Excluding LOd and LOk sequences, 
stress = 5.34. Populations discussed in the main text are highlighted with bold font and a red line in the plot. 
doi:1 0.1 371 /journal.pone.00991 1 7.g002 
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An Analysis of Molecular Variance (Table 2) demonstrates the 
relative lack of differentiation among the southern African 
populations, with only ~6% of the variation being found between 
populations, irrespective of whether haplogroups LOd and LOk, 
which stem from post-immigration admixture, are included or not. 
A large proportion of the variance between populations is due to 
the differentiation of the Himba, Herero, and Damara, as shown 
by the fact that the between-population variance drops to ~3% 
when excluding these populations. Affiliation to either of the two 
major branches of the Bantu family (Eastern Bantu vs. Western 
Bantu) does not account for any genetic structure, as seen by the 
complete absence of variance between groups. The three 
pastoralist populations Herero, Himba, and Kuvale, are somewhat 
distinct from the non-pastoralist Bantu-speaking populations, as 
shown by the significant between-group variance of 2.6% and 
3.5%, respectively, depending on whether haplogroups LOd and 
LOk are included in the analysis or not. Nevertheless, the genetic 
variation of the populations included in the "pastoralist" and 
"non-pastoralist" grouping is higher than that between the groups. 
Only a rough geographic subdivision correlates with some degree 
of genetic structure: in this case the between group variance rises 
to ~5% (as opposed to a within group variance of ~ 1.7%). On a 
finer scale, too, the pairwise geographic distances correlate with 
the genetic distances: a Mantel test gives significant correlations 
both when including and excluding LOd and LOk sequences 
(r = 0.3286/p = 0.015 and r = 0.2575/p = 0.043, respectively). 

Haplogroups LOd and LOk 

The mtDNA haplogroups LOd and LOk have been convincingly 
shown to be characteristic of autochthonous populations of 
southern Africa [18,24,35]. They therefore represent an ideal 
measure for detecting admixture in the maternal line between the 
immigrating Bantu-speaking groups and these autochthonous 
populations. The frequency of these haplogroups ranges from 
complete absence in some of the populations of Zambia to 53% in 
the Kgalagadi of southern Botswana (Table S3, see Figure S3A in 
File S 1 for a graphic representation of the distribution of LOd and 

Table 2. Results of AMOVA analyses. 



LOk lineages in the populations considered here). Interestingly, 
hardly any of the LOd and LOk sequences found in the Bantu- 
speaking populations are directly shared with extant Khoisan 
foragers or pastoralists. As shown previously [33], the LOk 
sequences found in Bantu-speaking populations diverge consider- 
ably from those found in extant Khoisan populations. With respect 
to haplogroup LOd, as shown by the network only three Bantu- 
speaking populations (involving four Herero, two Tswana, and one 
Kgalagadi individual) share sequences directly with Khoisan 
(Figure 3). Three branches of the network are found nearly 
exclusively in Bantu-speaking populations: one of these (belonging 
to subhaplogroup LOd la and indicated by arrow 1 in Figure 3) is 
derived from a sequence type restricted to Khoe-speaking Shua 
from northeastern Botswana and is found in Kuvale and Himba, 
with one Kuvale type shared with Nyaneka. The two others 
(belonging to subhaplogroup LOdlbl and indicated by arrow 2 
and 3 in the figure) are at least 1 1 mutations distant from the 
closest Khoisan haplotype; the eight divergent haplotypes found 
on these branches again belong to Kuvale as well as to different 
populations of Zambia and Angola. The only population found in 
this branch that does not speak a Bantu language is the Damara. 

Haplogroups L3d and L3f 

The Himba and Herero stand out among the Bantu-speaking 
populations of southern Africa in having very high frequencies of 
haplogroups L3d (38% and 47%, respectively) and L3f (29% and 
33%, respectively), while in their cultural and linguistic relatives, 
the Kuvale, L3d is absent and L3f has a frequency of only 6% 
(Table S3). In contrast, the geographic neighbors of the Himba 
and Herero, the Khoe-speaking Damara, have 63% L3d but 
completely lack L3f (cf. Table S3). The high levels of these two 
haplogroups are thus clearly of key importance for understanding 
the prehistory of the Herero, Himba, Kuvale, and Damara 
populations. 

While haplogroup L3d is found across Africa at low frequency 
(Figure 4A, see Table S2 for the populations included in the Surfer 
map), the lineages found at high frequency in southwestern Africa 







n of groups 


between groups 


between pops (within groups) 


within pops 


a) Including haplogroups LOd/LOk 


All 23 populations 


1 




5.51** 


94.49 


20 populations (excl. HER, HIM, DAM) 


1 




3.22** 


96.78 


Linguistic criteria (West vs East Bantu) a 


2 


-0.14 


4.72** 


95.42 


Subsistence criteria (Pastoralists vs non-pastoralists) a 


2 


2.64* 


3.97** 


93.39 


Geographic Criteria (NW, SW, SE, Centre, NE) b 


5 


4.80** 


1.77** 


93.43 


b) Excluding haplogroups LOd/LOk 


All 23 populations 


1 




5.94** 


94.06 


20 populations (excl. HER, HIM, DAM) 


1 




2.58** 


97.42 


Linguistic criteria (West vs East Bantu) 3 


2 


-0.33 


4.85** 


95.48 


Subsistence criteria (Pastoralists vs non-pastoralists) a 


2 


3.50* 


3.83** 


92.67 


Geographic Criteria (NW, SW, SE, Centre, NE) b 


5 


5.31** 


1.64** 


93.06 



*significant at 0.05 level; **significant at 0.01 level. 

a The grouping by linguistic and subsistence criteria followed the assignment in Table 1. 

"Geographic grouping: NW = OVM, NYA, KUV, GAN; SW = HER, HIM; SE = KGA, TSW, SHO; NE = NEZ; CENTRE = CHO, MBN, NKO, LOZ, LUY, KWA, SHA, MBK, TOT, FWE, SUB, 
TNG. 

Note: The groupings by linguistic, subsistence, and geographic criteria were performed without the Damara, as these cannot be assigned to the linguistic grouping 

West Bantu or East Bantu. 

doi:1 0.1 371 /journal.pone.00991 1 7.t002 
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Legend 
□ Damara 




Figure 3. Network of complete mtDNA genome sequences from southern Africa belonging to haplogroup LOd. Branches highlighted 
by arrows are discussed in the text. Only sublineages of L0d2a1, L0d1b2a, L0d1b2b and LOdlcl are shared directly between Bantu-speaking and 
Khoisan-speaking populations. 
doi:1 0.1 371 /journal.pone.00991 1 7.g003 



mostly belong to a single, highly divergent branch, namely L3d3al 
(Figure 4B, Figure S3B in File SI). This can be further divided into 
two clusters: one major node predominating in Khoisan, from 
which several haplotypes found in Zambian populations radiate, 
and a derived starlike cluster found mainly in the Himba, Herero, 
and Damara (indicated by an asterisk in Figure 4B). Dating the 
signal of expansion detectable in the L3d3al branch with the rho 
statistic [48] and the calculator from Soares et al. [47] gives an age 



of 395-6668 years BP, while the expansion detectable in the 
Himba, Herero, and Damara dates to 711-2130 BP. The first 
date is in good accordance with the pronounced branching dating 
to 2,500-3,000 years ago in a Bayesian tree of L3d sequences 
(highlighted in Figure S4A in File SI). 

In contrast to L3d, L3f is found in frequencies >20% not only 
in southwestern Africa, but also in some populations of the 
Cameroon/Chad border areas ([49,50]; Figure 4C, Table S2). 
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Figure 4. Surfer maps and networks of haplogroups L3d and L3f. A: Surfer map of L3d frequencies in Africa. B: Network of African complete 
mtDNA genome sequences belonging to haplogroup L3d. C: Surfer map of L3f frequencies in Africa. D: Network of African complete mtDNA genome 
sequences belonging to haplogroup L3f. 
doi:10.1371/journal.pone.0099117.g004 



Nevertheless, the sequences found in the Himba and Herero all 
fall onto one restricted branch L3flb4a (Figure 4D, Figure S3B in 
File SI). Several other Bantu-speaking populations from Namibia 
and Angola fall into this cluster as well, and the Himba and 
Herero share two haplotypes with their cultural and linguistic 
relatives, the Kuvale. This cluster exhibits a signal of expansion 



which can be dated with the rho statistic to between 526 and 4234 
years BP; this corresponds to a pronounced branching 2,500- 
3,000 years ago in the Bayesian tree of L3f sequences (highlighted 
in Figure S4B in File SI). 

The Damara, who have the highest frequency of L3d and who 
share a highly frequent L3d haplotype with the Herero and 
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Himba, entirely lack L3f. This finding is compatible with two 
scenarios: 1) The mtDNA pool of the Damara, the Himba, and 
Herero is derived from a single ancestral population, and 
haplogroup L3f was lost in the Damara due to genetic drift. 2) 
The Damara mtDNA lineages stem from a different ancestral 
population than the Himba and Herero, and the Himba and 
Herero incorporated large amounts of haplogroup L3d sequences 
through gene flow from Damara (a scenario that has been 
suggested from analyses of linguistic data [51]). In order to 
distinguish between these hypotheses, we performed resampling 
tests, assuming a frequency of 31% L3f (with 95% confidence 
intervals (C.I.) 19-46%) in the Himba/Herero as well as 24% of 
the single L3d3a haplotype shared with the Damara (C.I. 13- 
37%). For the Damara, a lack of haplogroup L3f (C.I. 0-9%) and 
32% of the L3d3a haplotype shared with Himba and Herero (C.I. 
18-49%) was assumed. In addition, in the resampling test in which 
we tried to assess the probability that the Himba, Herero, and 
Damara would have retained a single shared haplotype at high 
frequency, we included a probability of change of the L3d3a 
haplotype each generation with a rate of one mutation every 3533 
years following the rate of Soares et al. [47] for the full mtDNA 
genome. As can be seen from Table 3, the presence of the L3d 
haplotype shared at high frequency by the Himba, Herero, and 
Damara is expected with a probability >0.05 even after a split of 
2000 years if the frequency of this haplotype in the ancestral 
population ranged from ~ 10-50%. Conversely, if haplogroup L3f 
was present in the ancestral population at a frequency of ~5-30%, 
it could have drifted to high frequency in the Himba and Herero 
and subsequently been lost in the Damara if the split took place 
2000 years ago. Thus, the scenario of shared ancestry of Damara, 
Himba and Herero with subsequent loss through drift in the 
Damara of haplogroup L3f cannot be excluded. 

It is likewise intriguing that the culturally and linguistically 
closely related Herero, Himba, and Kuvale have such divergent 
mtDNA genepools. This might be explained in two ways: 1) these 
populations stem from a common ancestral population, and 
differential gene flow led to their strong divergence; 2) these 
populations have distinct maternal ancestors and their cultural and 
linguistic relationship is due to a shift in language and culture. 
These two alternatives were also assessed with a resampling test, 
assuming a frequency of 3 1 % L3f (with 95 % confidence intervals 
(C.I.) 19-46%) and 43% haplogroup L3d (C.I. 29-58%) in the 
Himba/Herero and a frequency of 5.7% L3f (C.I. 1-16%) and a 
lack of haplogroup L3d (C.I. 0-7%) in the Kuvale. As can be seen 
from Table 3, the hypothesis of a shared ancestor who carried 
both L3f and L3d is not compatible with the data: even though 
haplogroup L3f could have drifted to the observed frequencies if its 
frequency in the ancestral population ranged from ~8-31%, 
haplogroup L3d could not have been simultaneously lost from the 
Kuvale and drifted to the high frequencies currently observed in 
the Himba and Herero. 

Discussion 

Genetic Diversity of Bantu-speaking Populations and 
Western-Eastern Bantu Division 

Overall, the Bantu-speaking populations of southern Africa are 
genetically quite homogenous, with a few exceptions such as the 
Herero and Himba or Tswana and Kgalagadi. While linguistically 
the populations can be divided into those speaking Western Bantu 
languages and those speaking Eastern Bantu languages, this 
division is not detectable in the maternal genepool, with none of 
the variance among populations corresponding to this linguistic 
grouping (Table 2). Furthermore, the amount of haplotypes shared 



between Eastern and Western Bantu speakers does not differ from 
the amount of haplotypes shared within each linguistic group: 5 1 
of 258 haplotypes are shared among Eastern Bantu speakers, 80 of 
381 haplotypes are shared among Western Bantu speakers, and 52 
haplotypes are shared between Eastern and Western Bantu 
speakers. These results are in good accordance with a previous 
study [1 1] and support the suggestion that any potential genetic 
signal of the split between Eastern and Western populations was 
subsequently erased through admixture [52]. An alternative 
explanation for the lack of genetic differentiation between the 
populations speaking West and East Bantu languages is their 
possibly quite recent split, since East Bantu languages might be an 
offshoot of West Bantu languages [52]. Furthermore, the so-called 
Urewe pottery, the first Early Iron Age ceramic tradition of the 
Great Lakes region, is often linked with the arrival and spread of 
Bantu speakers in Eastern Africa [1]. The date of its emergence 
around 2500 years BP [53,54] can thus serve as an approximate 
starting point of the Eastern Bantu dispersal [55] . 

Nevertheless, within the homogenous mtDNA landscape of 
southern African Bantu-speakers some populations do stand out. 
The most notable outliers are the Herero and Himba (discussed in 
detail below); apart from these, the Kgalagadi and Tswana as well 
as Northeast Zambia are separated in the MDS and CA analysis, 
respectively. The Tswana and Kgalagadi are characterized by very 
high frequencies of haplogroup LOd, which provides good 
evidence for extensive admixture in the maternal line with 
autochthonous populations (Figure S3A in File SI). Nevertheless, 
this high level of indigenous admixture is not the only reason for 
their distinctiveness, since they remain separate from other 
populations in the MDS analysis even when haplogroups LOd 
and LOk are excluded (Figure 2B). That this separation is mainly 
due to divergent sequence types rather than a distinct haplogroup 
composition is demonstrated by the fact that they do not stand out 
in the CA analysis (Figure S2 in File SI). The Tswana and 
Kgalagadi speak closely related languages belonging to the 
homogenous and close-knit Sotho-Tswana group, which is clearly 
distinct from surrounding language groups [56,57]. Speakers of 
these languages immigrated from further southeast into what is 
now Botswana only 500-800 years BP [58]; they would thus have 
been relatively isolated from the other Bantu-speaking populations 
included in this study. The admixture of the Kgalagadi and 
Tswana with autochthonous populations is likely to have taken 
place to a large extent before their immigration into their current 
area of settlement while they were still settled further to the 
southeast. This is evidenced by their complete lack of LOk (which is 
found in high frequency in Khoisan populations of western 
Botswana [33]) and by the distinctiveness of most of their LOd 
lineages, with only one Kgalagadi and two Tswana LOd sequences 
shared with Khoisan populations from Namibia and Botswana 
(Figure 3). The Northeast Zambia population shows some affinities 
with populations further to the northeast. For instance, it is the 
only population included here to carry haplogroups LOf and L4; 
for both of these haplogroups an eastern African origin has been 
suggested [19,59,60]. The presence of these haplogroups high- 
lights the role played by admixture in the diversification of Bantu- 
speaking populations [60,61]. 

Admixture with Autochthonous Populations 

The degree of admixture between the immigrating Bantu- 
speaking agriculturalists and autochthonous populations is highly 
variable. On the one hand, some Zambian populations, such as 
the Nkoya, the Eastern Tonga, or the Totela, carry no 
autochthonous lineages at all, while on the other hand the Kuvale, 
Fwe, Tswana, Wider_Shona, and Kgalagadi have 21-53% of 



PLOS ONE | www.plosone.org 



9 



June 2014 | Volume 9 | Issue 6 | e99117 



mtDNA Variation among Bantu-Speakers in Southern Africa 





rs 
in 


o 
q 


q 


fN 

q 




o 
q 


o 
q 


q 


q 


q 


q 


o 
q 


o 
q 


o 
q 


































m 
in 


q 


ro 
q 


LO 

q 


00 

m 


o 
q 


o 
q 


q 


q 


q 


q 


o 
q 


o 
q 


o 
q 




































00 

q 


q 


o 




o 
q 


q 


q 


m 
q 


CM 

q 


q 


o 
q 


o 
q 


q 








































o 
rN 




o 
q 


q 


fN 

q 


q 


m 
q 


q 


o 
q 


o 
q 


q 




































fN 
CN 


CO 


fN 
fN 


fN 

rn 


o 
q 


q 


fN 

q 


LTl 

q 


00 

q 


o 


o 
q 


o 
q 


q 


































rs 
m 


r-. 
*t 


CTi 

rn 


O 

rn 


Oi 

1 


q 


CM 

q 


m 
q 


o 


q 


o 


o 
q 


q 


q 


































m 
m 


in 


r\i 

LO 


ro 


m 

rj 


q 


q 


r-«. 
q 


LTl 




CO 


q 


q 


fN 

q 


































0\ 

1 


vq 


CO 

in 


fN 


<N 


m 
q 


q 


00 
q 


m 


fN 


r-s 


q 


q 


fN 

q 


































m 

IN 


fN 
\D 


in 

LA 






00 
q 


Ol 
q 


o 


r-s, 


r--. 

fN 


vO 


m 
q 


rN 
q 


ro 
q 






































O 
*fr 


rn 




q 




o 


o 
rN 


*t 


ro 


fN 

q 


q 


ro 
q 


































rs 


rN 

on 


MD 

fN 


■sf 

rN 


0i 

o 


q 


in 
q 


CTi 

q 


q 


q 


CO 

q 


o 
q 


q 


fN 

q 
































m 






rN 


m 
q 


o 
o 


o 


m 
o 


fN 

o 


fN 
O 


ro 
O 


o 
o 


o 
o 


o 




6 


o" 


d 


d 


d 


d 


d 


d 


d 


d 


d 


d 


d 


d 




o 


fN 
o 


o 


ro 
O 


q 


o 
o 


o 
o 


o 


o 
o 


o 
o 


o 
o 


o 
o 


o 
o 


o 
o 




d 


d 


d 


d 


d 


d 


d 


d 


d 


d 


d 


d 


d 


d 




« 
ra 








01 
01 
ro 






















£ 

u 








+■> 
c 

u 






















<5 








01 




.Q 














T3 




0- 


(0 

>, 

o 
o 
m 


CO 

>, 




0. 


ra 
>» 
o 
o 
un 


ra 
>. 


ra 

>s 


ra 
>. 

o 
o 
m 


(0 


>, 


ra 

o 
o 
un 


ra 


ra 
>, 




!s 

'E 


o 
o 
o 


o 
o 
o 

fN 


!s 

'E 


o 
o 
o 


O 

o 
o 

fN 


o 
o 
o 


o 
o 
o 

fN 


o 
o 
o 


o 
o 
o 

fN 



c. < 



£ o 

CT> 

H < 



; = ra 

,<u <v 



S £ 



I £ 

Q s 

"O ra 

S I 

O Q 

OJ c 



II 



J5 .£ 

I s 



, CTI CT> 
ra _0 



o oj 
a x 



ro 



<u 2: 

«i 
1 = 

ra ro 
_ ra ro 
ra _q O 

fee 

c •- m 

■j— "ct 55 

5 H3 8 



-c n 

■4— 1 — 1 

CTI CTi CT) CT> 

c c c c 



c c 



QJ 

c 

o 

c ,e 9- 

ra ra 'ra ra 75 
qj a> a3 <u E 
^ n_ ^ ^ 3 

o o o o .o 

>^ 2? ^ 

£ = * ■= I s - 

3 5 S S 2 
5 5 j§ £ o 

O O o o ^ 

a. ft i i o 



PLOS ONE | www.plosone.org 



10 



June 2014 | Volume 9 | Issue 6 | e99117 



mtDNA Variation among Bantu-Speakers in Southern Africa 



haplogroups LOd and/ or LOk. There is a noticeable geographical 
cline in the presence of these two different autochthonous 
haplogroups (cf. [33]), with LOd being present at high frequency 
in Bantu-speaking populations from the southern range of our 
dataset (Botswana, Namibia, and southern Angola), while LOk is 
practically absent from these populations (Figure S3A in File SI). 
While the frequency distribution of LOd in our Bantu-speaking 
populations matches that of extant Khoisan populations (where 
the highest frequencies of LOd are found in populations of South 
Africa, Botswana, and Namibia), the highest frequencies of LOk in 
extant Khoisan populations are found in western Botswana and 
northeastern Namibia [33]. Together with the fact that practically 
all of the LOk lineages found in Bantu-speaking populations are 
highly divergent, this distribution of LOk provides good evidence 
that the Bantu-speaking immigrants into Zambia intermarried 
with genetically distinct autochthonous populations who are 
nowadays extinct [20,33]. 

LOd, too, provides evidence that the gene flow between 
indigenous populations and immigrating Bantu speech communi- 
ties involved genetically distinct autochthonous populations. For 
instance, there are two divergent branches belonging to sub- 
haplogroup LOdlbl that are practically restricted to Bantu- 
speaking populations, being found in the Kuvale and Nyaneka of 
southwestern Angola and in some populations of western Zambia 
as well as one Damara (see arrows 2 and 3 in Figure 3). In 
addition, only the Herero, Tswana, and Kgalagadi share LOd 
haplotypes directly with neighboring Khoisan populations, and 
this to differing degrees. The admixture between Herero and 
Khoisan populations is likely to have taken place quite recently, as 
they share four out of their five LOd sequences (Table S4). In 
contrast, the Tswana share only two of their five LOd sequences 
with different Khoisan populations, while two sequences are at 
least four to five mutational steps distant from any Khoisan 
haplotypes. Rather surprisingly, the Kgalagadi, who are the 
Bantu-speaking population with the highest level of autochthonous 
haplogroups, share only one out of their ten LOd sequences 
directly with the neighboring Khoe-speaking G|ui; the other 
haplotypes are between one and at least eight mutational steps 
distant from the closest Khoisan sequence type (Table S4). These 
data indicate that the gene flow from autochthonous populations 
into most of the Bantu-speaking populations included in the 
dataset took place a long time ago and/or involved Khoisan 
populations who did not survive into the present. It is furthermore 
notable that the Mbukushu, who are reported to have been closely 
associated with Khwe populations, sharing villages and intermar- 
rying with them [62], do not share any sequences with Khwe. 

Relationships of Kuvale, Himba, Herero, and Damara 

The most striking results of this study concern the genetic 
differences between the culturally and linguistically closely related 
Himba, Herero, and Kuvale on the one hand and the genetic 
similarity of the Herero and Himba to the culturally and 
linguistically distinct Damara, on the other. The Himba, Herero, 
and Damara differ in their maternal lineages from all other 
populations included here, as shown by the AMOVA results 
(Table 2) and the MDS and CA analyses (Figure 2 and Figure S2 
in File SI). This is in good accordance with analyses of 
genomewide SNP data in which the Himba and Damara also 
stand out as being distinct from other populations speaking Niger- 
Congo languages [17]. However, the Damara differ from the 
Himba and the Herero in their complete lack of haplogroup L3f, 
suggesting that they have had a different demographic history 
from the Himba and Herero - a hypothesis that is further 
supported by the Bayesian Skyline Plots for these populations 



(Figure S5 in File SI): these show a strong signal of recent 
expansion for the Herero and Himba but not the Damara 
(although such results should be taken with caution given the limits 
of this method to reconstruct very recent demographic events, 
especially in cases of reduced diversity). The high frequencies of 
shared sequences belonging to haplogroup L3d and the complete 
absence of haplogroup L3f in the Damara could reflect either of 
two scenarios: 1) The Himba, Herero, and Damara share a 
common ancestor who carried both L3d and L3f, but the Damara 
lost the latter haplogroup by drift; or 2) The ancestor of the Himba 
and Herero carried haplogroup L3f, but not L3d, and only 
incorporated L3d sequences through admixture with Damara in 
the maternal line. As demonstrated by the results of the resampling 
test (Table 3), the genetic data do not exclude a shared ancestry of 
the three populations, in good accordance with the fact that in 
previous literature both the Damara and the Herero were referred 
to as Damara, with the specification "Berg Damara" for the 
former and "Cattle Damara" for the latter [29]. 

However, the Kuvale, who are culturally and linguistically 
closely related to the Herero and Himba, lack haplogroup L3d 
entirely, but share L3f sequences with the Himba and Herero. As 
indicated by the results of the resampling test (Table 3), the 
Himba, Herero, and Kuvale could stem from a common maternal 
ancestral population, but only if it carried haplogroup L3f and not 
L3d. Thus, a shared ancestor of Himba, Herero, and Damara 
would have had to carry both L3d and L3f, whereas a shared 
ancestor of the Himba, Herero, and Kuvale would have had to 
carry only L3f and no L3d - clearly conflicting scenarios. It is thus 
plausible that the Himba, Herero, and Kuvale share a common 
ancestor and a common source of L3f lineages, and that the 
ancestor of the Himba and Herero picked up haplogroup L3d at a 
later stage through intermarriage with the Damara. This 
hypothesis coincides with the later stages of a detailed proposal 
for the history of the Herero-speaking peoples (i.e. the ancestors of 
the Kuvale, Himba, and Herero) based entirely on linguistic data 
[51]. This suggests a fairly recent immigration into southwestern 
Africa followed by different periods of intermarriage with 
indigenous peoples - both Bantu-speaking populations in north- 
eastern Angola and Khoisan populations further south. This 
scenario of migration and differential intermarriage is potentially 
reflected in the signal of a founder event and expansion detectable 
in the Himba and Herero (Table 1, Figure 3, Figure S4 in File SI; 
cf. [63]) as well as in the close affinities of the Kuvale with 
neighboring populations of Angola and the proximity of the 
Herero and Himba to the Damara. 

The LOd lineages in the Kuvale were previously suggested to 
possibly stem from admixture with the now extinct Angolan 
Kwadi [21]. These were a pastoralist population who lived within 
the Kuvale territory and spoke a language related to the Khoe 
languages, a family that has been suggested to have been brought 
to southern Africa by a pre-Bantu migration of pastoralists [64] . 
Since haplogroup LOd is widespread across Khoisan foragers and 
pastoralists [18], it is difficult to unambiguously assign the Kuvale 
LOd lineages to a relatively recent pastoralist migration. However, 
a branch of haplogroup LOd that is restricted to the Kuvale, 
Himba, and Nyaneka (indicated by arrow 1 in Figure 3) derives 
from a sequence type found in seven Shua. These are a Khoe- 
speaking population of northeastern Botswana who are considered 
possible descendants of the Khoe-Kwadi-speaking pastoralists who 
would also have been the ancestors of the Kwadi [64] . Since the 
Shua are settled so far to the east of the Kuvale, direct admixture 
seems implausible, raising the possibility that these lineages derive 
from admixture with Kwadi. 
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In contrast, subhaplogroup L3d3a has a much more confined 
distribution and is more likely to have been brought to the area by 
Khoe-speaking pastoralists as previously suggested [18]. This 
suggestion is in good accordance with the signal of expansion 
detectable in these lineages ~ 2,500-3000 years BP (Figure S4 in 
File SI), since archaeological evidence of pastoralism is detectable 
from ~2,200 years in the region [16]. Surprisingly, while roughly 
50% of the maternal genepool of the Himba, Herero and Damara 
appear to stem from this putative Khoe admixture, in analyses of 
genomewide SNP data the Himba and Damara show no affinities 
with Khoe-speaking populations [1 7] . However, since the putative 
incorporation of Khoe maternal lineages might have involved only 
a few women related in the maternal line, followed by an 
expansion of this lineage after the incorporation of Damara 
women by the Himba and Herero ancestor, this lineage would 
have been retained unchanged due to the specific characteristics of 
mtDNA. In contrast, the signal of relationship with Khoe-speaking 
populations may have been lost from the autosomal DNA if this 
single admixture event was followed by several generations of 
intermarriage with non-Khoe populations. 

It thus appears likely that the maternal ancestors of the Kuvale, 
Herero, and Himba had a haplogroup composition similar to that 
found in the Kuvale today, albeit with somewhat higher 
frequencies of L3f, which would have partly resulted from 
intermarriage with neighboring populations in what is now 
Angola. The shared ancestor of the modern-day Herero and 
Himba would later have incorporated Damara women carrying 
haplogroup L3d, themselves possibly the descendants of Khoe- 
speaking peoples, while the ancestor of the modern-day Kuvale 
would have intermarried with Kwadi pastoralists. Detailed Y- 
chromosomal as well as additional genome-wide analyses of the 
Damara, Himba, Herero, Kuvale, and other Bantu-speaking 
populations of southern Africa are needed to further investigate 
the prehistory of these groups. 

Conclusions 

In summary, we have been able to show that the maternal 
genepool of the Bantu-speaking populations of southern Africa is 
very homogenous. While the linguistic division into Western and 
Eastern Bantu does not correlate with genetic divergence, the 
results of the AMOVA and Mantel analyses demonstrate the 
impact of geography in structuring the genetic variation. 
Furthermore, there are big differences in the extent of intermar- 
riage between Bantu-speaking agriculturalists and autochthonous 
peoples, with some populations showing no evidence of gene flow, 
while others, like the populations of Botswana, carry substantial 
proportions of autochthonous lineages. The lack of LOd/LOk 
sequences shared between Bantu and Khoisan populations 
suggests that the admixture undergone by most of the Bantu- 
speaking immigrants into southern Africa took place soon after 
their entering the region and pardy involved now-extinct 
autochthonous populations. Lastiy, the genetic data are in good 
accordance with a linguistic hypothesis concerning the final stages 
of the settlement of Herero-speaking peoples in which both 
language and culture contact as well as genetic admixture play 
important roles. Analyses of the Y-chromosomal diversity will shed 
further light on these processes. 

Supporting Information 

File SI Contains the files: Figure SI: Schematic phylogeny 
of the African haplogroups found in the 982 individuals of our 



dataset. The reference sequences RSRS (Behar et al. 2012) and 
rCRS (corresponding to European haplogroup H, Andrews et al. 
1999) are indicated with a black line. Dashed lines indicate nodes 
with low posterior probability for which the topology could not be 
resolved. The width of the triangles is proportional to the 
frequency of a given haplogroup in the dataset. Haplogroups 
L3d and L3f, in yellow, are described in more detail in Figure S4A 
and S4B. Figure S2: CA plots based on basic haplogroup 
frequencies. Color coding by linguistic affiliation as specified in 
Table 1. A: Including all haplogroups and all populations. B: 
Including all haplogroups but excluding outlier populations. C: 
Excluding haplogroups LOd and LOk and excluding outiiers. 
Figure S3: Haplogroup frequencies of important haplogroups in 
the populations studied here. A: Haplogroups LOd and LOk. B. 
Subhaplogroups L3d3al and L3flb4a. The size of the circles is 
proportional to the sample size. Figure S4A: Bayesian tree of all 
African complete mtDNA sequences belonging to haplogroup 
L3d, and S4B: belonging to haplogroup L3f. The sequences 
included in these trees are the same as those included in the 
networks in Figure 4B and D. Branches discussed in the main text 
are highlighted. The posterior probability associated with major 
nodes is indicated in red font. Note the different time scale 
between the two trees. Figure S5: Bayesian Skyline Plots for the 
Himba, Herero, Kuvale and Damara. X axis: time in years ago. Y 
axis: effective population size per generation time. 
(PDF) 

Table SI Information on country of sampling, language and 
population affiliation with assigned population as well as GenBank 
accession numbers for individual samples included in the study. 
(XLSX) 

Table S2 Frequency of L3d and L3f in the African populations 
included in the Surfer maps and their geographical coordinates. 
(XLSX) 

Table S3 Basic haplogroup frequencies in the populations 

studied here. 

(XLSX) 

Table S4 LOd haplotypes shared between Bantu speakers and 

Khoisan. 

(XLSX) 
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